Clustered Checkpointing and Partial Rollbacks for Reducing Conflict Costs in STMs
نویسندگان
چکیده
A Software Transactional Memory is a concurrency control mechanism that executes multiple concurrent, optimistic, lockfree, atomic transactions, thus alleviating many problems associated with conventional mutual exclusion primitives such as monitors and locks. With the advent of massive multi-cores, more transactions can be initiated concurrently, however resulting in an increase in the percentage of conflicting transactions. Each time a transaction conflicts, it imposes a significant cost on the system, originating from the need to abort and redo all the operations, including the costly shared memory read operations, thus making the overall system significantly heavy and impractical. We present an algorithm, Clustered Checkpointing and Partial Rollback (CCPR), for reducing the conflict costs of transactions in the face of increasing conflicts. The algorithm is based on intelligent checkpointing of transactions as they proceed, and, in case of conflict, rolling them back to a safe, consistent, intermediate checkpoint, thus reducing conflict costs. The intelligence of the algorithm lies in the fact that as conflicts decrease, the checkpointing costs go low, however, when conflicts increase, the checkpointing costs increase but are still pretty much less than the amount of savings obtained by the partial rollback of the conflicting transactions. We simulated several applications in the CCPR framework and found that it can result in as good as 17% reduction in the conflict costs originating from the need to redo all the shared memory read operations. General Terms Concurrent Programming, Software Transactional Memory
منابع مشابه
LARKTM: Efficient, Strongly Atomic Software Transactional Memory
Software transactional memory provides an appealing alternative to locks by improving programmability, reliability, and scalability without relying on custom hardware. However, existing STMs are impractical because they add high overhead and provide weak semantics—or they provide strong atomicity semantics and add even higher overhead. Existing STMs are impractical largely due to the cost of co...
متن کاملExploring Checkpointing and Closed Nesting in Distributed Transactional Memory
Checkpointing and closed nesting are mechanisms typically used for implementing partial roll-back in transactional systems. Closed nesting limits the amount of work to redo on an abort by allowing sub-transactions to abort and retry independently from their parents. Checkpointing goes further and allows a transaction to be rolled back to any previous point where a checkpoint was saved. Checkpoi...
متن کاملUsing Re ection for Checkpointing Concurrent Object Oriented Programs
This paper presents a re ective approach to checkpointing concurrent object oriented programs. We describe a checkpointing and rollback library for multithreaded programs written in C++. We demonstrate some of the unique features o ered by this library, such as selective checkpointing and selective rollbacks of threads of a process that are achievable only through the use of re ection.
متن کاملManagement of Fault Tolerance Information for Coordinated Checkpointing Protocol without Sympathetic Rollbacks
This paper presents the condition for an extended global recovery line for coordinated checkpointing protocol and a new garbage collection protocol on checkpoints and message logs in order to avoid the sympathetic rollback caused by lost messages. Since previous works assumed the communication channel does not lose the in-transit messages, those works on garbage collection in coordinated checkp...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010